Method for nucleic acid sequencing

ABSTRACT

A method for determining the sequence of at least a portion of a single-stranded nucleic acid molecule by base-specifically labeling the exposed bases of the nucleic acid molecule using heavy element-substituted nucleotide bases which form Watson-Crick type base-pairs with the exposed bases of the nucleic acid molecule and then imaging the labeled single-stranded nucleic acid molecule using electron microscopy, e.g., transmission electron microscopy (TEM), or some other method that permits discrimination of the heavy element substituted nucleotide bases is described. The image is analyzed to determine the base sequence of at least a portion of the nucleic acid molecule.

[0001] This application claims priority from U.S. application Ser. No.09/992,538 (U.S. Published Application 2002 0086317 A1) filed Nov. 19,2001, which application claims priority from Japanese Application2000-351844 filed Nov. 17, 2000, the entirety of which applications arehereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] There are a variety of methods for sequencing nucleic acidmolecules. Historically, the most common methods have been based onchemical (Maxam and Gilbert sequencing) or enzymatic (Sanger dideoxysequencing and exonuclease-based sequencing) reactions that createspecific truncated nucleic acid molecules that are then separated byelectrophoretic techniques in order to determine their relative length.More recently, potentially higher throughput techniques, includingpyro-sequencing and hybridization-based sequencing methods, have beendeveloped. It has also been proposed that scanning tunneling microscopycould be used to directly visualize the sequence of a nucleic acidmolecule.

SUMMARY OF THE INVENTION

[0003] The invention features a method for determining the sequence ofat least a portion of a single-stranded nucleic acid molecule bybase-specifically labeling the exposed bases of the nucleic acidmolecule using heavy element-substituted nucleotide bases which formWatson-Crick type base-pairs with the exposed bases of the nucleic acidmolecule and then imaging the labeled single-stranded nucleic acidmolecule using electron microscopy, e.g., transmission electronmicroscopy (TEM), or some other method that permits discrimination ofthe heavy element substituted nucleotide bases. The image is analyzed todetermine the base sequence of at least a portion of the nucleic acidmolecule.

[0004] To create a suitable image of a portion of a nucleic acidmolecule, a single-stranded nucleic acid molecule is placed on a supportfilm, preferably in an elongated state such that the spacing betweenbases is greater than 0.5 nm (or at least 0.6 or 0.7 nm) and thenexposed to one or more base-specific labels that can be discriminated byan appropriate imaging technique such TEM. The base-specific labels arefour different modified nucleotide bases (adenine (A), guanine (G),cytosine (C) and thymine (T) or uracil (U)) each of which includes adifferent heavy element-containing group. These modified nucleotidebases form base-pairs with the exposed bases in the elongatedsingle-stranded nucleic acid molecule according to normal Watson-Crickbase pairing rules (A-T or A-U and G-C).

[0005] The TEM used for imaging can optionally be equipped with a tiltimaging system. Phase-contrast TEM and complex TEM can also be used forimaging the labeled molecule and distinguishing the heavy elementlabeled bases based of the intensity of electron scattering. Theelectron microscope image can be analyzed with software fordiscriminating heavy elements within the heavy element containinggroups. When heavy elements are distinguished using such software, theelectron scattering intensity measured by the microscope is used as abasis for quantitative measurement of the heavy metals and thus baseidentification.

[0006] The heavy elements within the heavy element containing groupsused to modify the nucleotide bases are preferably metal elements havingan electron scattering intensity sufficient to be detected by electronmicroscopy. For example, metals with an atomic number greater than 25are suitable for use as heavy elements in the context of the methods ofthe invention. So that the different modified nucleotide base can bedistinguished, each must include a heavy element-containing group thatdiffers in atomic number from the atomic number of the heavyelement-containing group of each other modified nucleotide base by about15. Thus, to obtain four distinguishable heavy element-containinggroups, one for each of the four nucleotide bases, it is desirable touse combinations of four different heavy elements having an atomicnumber greater than, e.g., 25 and differing in atomic number by about15. Examples of such combinations include: ₇₈Pt, ₆₃Eu, ₄₆Pd, and ₂₇Co;₉₂U, ₇₆Os, ₄₆Pd, and ₂₆Fe; ₈₀Hg, ₆₄Gd, ₄₈Cd, and ₃₀Zn; and ₈₉Ac, ₇₄W,₄₂Mo, and ₂₅Mn. A given heavy element containing group can include twoor more (2, 3, 4, 5, 6, 7, 8, 9, 10, or more) heavy element atoms andwithin a single group the two or more elements can be the same ordifferent. Where a heavy element containing group contains multipleatoms, it is the total atomic number of the heavy elements in a givengroup that preferably differs from the total of those in each other basespecific label by about 15.

[0007] The labeled nucleic acid molecule is imaged and the image isanalyzed to identify the order of the base-specific labels and thusdetermine the base sequence of at least a portion of the molecule. Tofacilitate analysis of the images, at least a portion the nucleic acidmolecule is preferably elongated when molecule is imaged. This can bebest achieved by placing the molecule on the support film in anelongated state. In cases where the nucleic acid molecule is not fullyelongated, e.g., where the molecule includes at least one loop formed bythe nucleic acid backbone crossing over itself, multiple images atdifferent tilt angles are obtained using, for example, an optional tiltimaging system. In this manner it is possible to collect depthinformation. Imaging processing techniques are then used to follow thebackbone and thus determine the order of bases in the nucleic acidmolecule.

[0008] In one example, heavy element-substituted A is labeled withpalladium (atomic number 46), a heavy element-substituted C is labeledwith europium (atomic number 63), a heavy element-substituted G islabeled with platinum (atomic number 78), and a heavyelement-substituted U is labeled with cobalt (atomic number 27). Thesefour heavy element labeled bases are exposed to a single-strandednucleic acid molecule (DNA or RNA) on a support surface. The heavyelement-substituted base selectively form base-pairs with T (or U), G,C, and A in the single stranded nucleic acid molecule. Non-base-pairedheavy element labeled bases are washed away. The heavy elements in theremaining labeled nucleotides act as reporters that can be discriminatedby TEM. Thus, if a platinum atom is discerned on a TEM image at a givenposition, this means that the Pt-labeled G is present at that positionand that the corresponding base of single-chain DNA is C. The otherthree bases can be similarly identified by discerning the electronscattering of their respective metal atoms.

[0009] The invention features a method for determining the sequence of aplurality of bases in a nucleic acid molecule, comprising: (a) providinga support surface bearing a single stranded nucleic acid moleculecomprising a plurality of bases labeled with one of four differentbase-specific labels; (b) detecting the base-specifically labeled basesby electron microscopy; and (c) analyzing the detected base-specificallylabeled bases to determine the sequence of the nucleic acid molecule.

[0010] In various embodiments: each base-specific label comprises atleast one atom having an atomic number greater than 25; each bases islabeled with one of four base-specific labels, wherein each of said fourbase-specific labels comprises a different element having an atomicweight greater than 25 and each element having an atomic weight greaterthan 25 differs in atomic weight from each other element having anatomic weight greater than 25 by an atomic weight of at least 15; theelements are selected from the group consisting of: Pt, Eu, Pd, Co, U,Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn; and the four base-specificlabels consist of: (i) a label comprising a substituted adenine, (ii) alabel comprising a substituted uracil or a substituted thymine, (iii) alabel comprising a substituted cytosine, and (iv) a label comprising asubstituted guanine. The bases are not substituted at the positionsinvolved Watson-Crick base-pairing, i.e., the groups normally found atthese positions in the bases are present.

[0011] In certain embodiments the four elements used in are: (i) Pt, Eu,Pd, and Co; (ii) U, Os, Pd and Fe; (iii) Hg, Gd, Cd and Zn; or (iv) Ac,W, Mo and Mn.

[0012] In various embodiments: N7 or N9 of the substituted adenine issubstituted with a group comprising an element having an atomic weightgreater than 25; N7 or N9 of the substituted guanine is substituted witha group comprising an element having an atomic weight greater than 25;N1 of the substituted cytosine is substituted with a group comprising anelement having an atomic weight greater than 25; N1 of the substituteduracil is substituted with a group comprising an element having anatomic weight greater than 25; N1 of the substituted thymine issubstituted with a group comprising an element having an atomic weightgreater than 25; the substituted adenine has a substitution thatincreases the specificity of Watson-Crick type base-pairing with uracilor thymine; the substituted thymine or substituted uracil has asubstitution that increases the specificity of Watson-Crick typebase-pairing with adenine; the substituted cytosine has a substitutionthat increases the specificity of Watson-Crick type base-pairing withguanine; the substituted guanine has a substitution that increases thespecificity of Watson-Crick type base-pairing with cytosine; thesubstituted adenine is substituted at C2 or C8 or both C2 and C8; thesubstituted thymine or the substituted uracil has a substitution at C5or C6 or both C5 and C6; the substituted cytosine or the substituteduracil has a substitution at C5 or C6 or both C5 and C6; the substitutedguanine is substituted at C2; and the modification is a substituted orunsubsituted alkyl group or a halogen.

[0013] In various embodiments: the 3′ nucleotide or the 5′ nucleotide ofthe nucleic acid molecule is covalently bound to a defined region of thesupport surface; the defined region of the support surface is coatedwith gold; the support surface bears a plurality of single-strandednucleic acid molecules wherein the 3′ nucleotide or the 5′ nucleotide ofeach nucleic acid molecule is covalently bound to a defined region ofthe support surface; the plurality of nucleic acid molecules form aregular array; and each of the plurality of single-stranded nucleic acidmolecules is covalently bound to a different defined region of thesupport surface.

[0014] In certain embodiments: each base-specific label comprises atleast 2 atoms having an atomic number greater than 25; and eachbase-specific label comprises a group selected from the group consistingof: B₁₀I₉COOH, C₂B₁₀I₁₀, C₂B₁₀Br₁₀, C₂B₁₀Cl₁₀, C₂B₁₀F₁₀.

[0015] In certain embodiments: the electron microscopy is transmissionelectron microscopy; the transmission electron microscopy includes theuse of a transmission electron microscope comprising a Zernike phaseplate located behind the objective lens; and the electron microscopycomprises the use of a complex electron microscope.

[0016] The invention also features a method for determining the sequenceof a plurality of bases in a nucleic acid molecule, comprising: (a)providing a support surface bearing a single stranded nucleic acidmolecule comprising a plurality of bases; (b) contacting the singlestranded nucleic acid molecule with four different base-specific labelsin the presence of an organic solvent that is free of hydrogen bonddonors and acceptors to allow non-covalent binding of the four differentbase-specific labels to the plurality of bases; (c) detecting thebase-specifically labeled bases by electron microscopy; and (d)analyzing the detected base-specifically labeled bases to determine thesequence of the nucleic acid molecule.

[0017] The invention also features: a substituted adenine wherein the C2or C8 position is substituted with a C₁-C₅ alkyl group or a halogen andthe N7 or N9 position is substituted with a group comprising an elementhaving an atomic weight greater than 25; a substituted guanine whereinthe C8 position is substituted with a C₁-C₅ alkyl group or a halogen andthe N7 or N9 position is substituted with a group comprising an elementhaving an atomic weight greater than 25; a substituted uracil whereinthe C5 or C6 position is substituted with a C₁-C₅ alkyl group or ahalogen and the N1 position is substituted with a group comprising anelement having an atomic weight greater than 25; and a substitutedcytosine wherein the C5 or C6 position is substituted with a C₁-C₅ alkylgroup or a halogen and the N1 position is substituted with a groupcomprising an element having an atomic weight greater than 25.

[0018] In various embodiments the element having an atomic weightgreater than 25 is selected from the group consisting of: Pt, Eu, Pd,Co, U, Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn.

BRIEF DESCRIPTION OF THE FIGURES

[0019]FIG. 1 is a schematic depiction of the sequencing of a portion ofa nucleic acid molecule according to one embodiment of the invention.

[0020]FIG. 2 is a schematic depiction of a support holding an array ofelongated nucleic acid molecules.

DETAILED DESCRIPTION

[0021] The FIG. 1 is a schematic depiction of the sequencing of aportion of a nucleic acid molecule according to one embodiment of theinvention. A DNA molecule is obtained 10. The molecule is denatured toobtain a single strand 20. This single strand is converted to anelongated form 30 that is placed on a thin film support 40. Theelongated molecule on the support is exposed to heavy element labelednucleotide bases (A°, T°, G°and C°) 50 which selectively base pair toexposed bases 60 in the elongated molecule. TEM is used to image theresulting base-specifically labeled molecule and the heavy elementlabeled nucleotide bases appear as a series of spots of varyingintensity depending on the atomic number of the heavy element present inthe heavy element labeled nucleotide base 70. The image is analyzed 80to determine the intensity of spots and thus the order of bases in thenucleic acid molecule.

[0022] Preparation of a Support Bearing an Elongated Nucleic AcidMolecule

[0023] The nucleic acid molecule being analyzed is placed on a specimensupport suitable for imaging by electron microscopy (i.e., an EM or TEMgrid). The support should be selected to avoid background noise problemsassociated with background shot noise and fog generated by the support.Background shot noise can be reduced by increasing the electron beamdose and fogging can be reduced by using a thin support. The specimensupport is preferably a thin film that firmly holds single-strandednucleic acid molecules, is robust enough to withstand intense electronbeam irradiation, and is made of a light element that does notsignificantly scatter the electron beam used by the EM or TEM.

[0024] Carbon thin film and aluminum thin film can be used to holdnucleic acid molecules for EM or TEM imaging. However, these films canproduce strong background fog. In addition, these films tend to decreasethe contrast of labeling heavy elements bonded as base pairs. Organicsupport films made of lipids or denatured natural proteins (e.g.,albumin and casein) or synthetic artificial proteins (e.g., polylysine)are generally more useful. Once a support film has been prepared, it istransferred to TEM grid.

[0025] There are a number of suitable approaches from binding andelongating nucleic acid molecules that have been attached at one end toa surface. For example, a coverslip can be placed over a liquid dropletcontaining the attached DNA molecule. As the droplet dries, the meniscusrecedes stretching the nucleic acid molecule perpendicular to themeniscus (Bensimon et al. 1995 Phys. Rev. Lett. 74:4754). Alternatively,the Langmuir-Blodgett film method may be employed. Here, the surfacebearing the nucleic acid molecule is slowly withdrawn from a buffer andthen slowly pulled from the buffer (Michalet et al. 1997 Science277:1518). Other elongation methods that are useful include: electricstretching (2002 Ultramicroscopy 91:139) and spin-stretching (Yokota etal. 1999 Anal. Chem. 71:4418).

[0026] An unmodified nucleic acid molecules can be bound at one end tomany surfaces. example, when a nucleic acid molecule in pH 5.5 MES(2-[N-morpholino]ethanesulfonic acid) buffer is applied to a surfacescoated with silane possessing a vinyl end group, one or both ends of themolecule spontaneously bind to the surface. In general, end binding (asopposed to binding internally) occurs over a narrow pH range (about 0.2units on hydrophobic surfaces such as polystyrene, Teflon and graphite)with the optimum pH on hydrophobic surfaces commonly being near pH 5.5(Allemand et al. 1997 Biophysical Journal 73:2064).

[0027] A nucleic acid molecule with a thiol group at one end will bindto gold. This method is described, for example, in U.S. Pat. No.5,472,881 (see also Rongchao et al. 2002 Nucl. Acids. Res. 30:1558 and1993 FEBS Lett. 336:452). Alkylthiol- or disulfide-terminated nucleicacid molecules can be directly adsorbed to gold. The nucleic acidmolecule can be derivatized at either end so that both 5′ and 3′immobilization is possible. Specific binding is achieved by simplyexposing the thiolated nucleic acid molecule suspending in buffer to thegold surface. This technique can be combined with electrostretchingtechniques to obtain elongated nucleic acid molecules affixed at one endto a gold anchor (Namasivayam et al. 2002 Anal. Chem. 74:3378).

[0028] Modification of Bases to Improve Organic Solvent Solubility andBase-Pairing Specificity

[0029] Base-specific labeling is achieved using standard nucleotidebases or derivatives thereof that are attached to a heavy elementcontaining group. To increase the specificity of base-pair formation,the nucleotide bases or derivatives thereof are exposed to the singlestranded nucleic acid molecule on an organic solvent. This is because anaqueous solvent will compete for the hydrogen bonding sites and disruptbase-pair formation.

[0030] Nucleotide base derivatives that have increase solubility in anorganic solvent or that form bases pairs with higher specificity areuseful in the methods of the invention. The four nucleotide bases aredepicted below to facilitate understanding of the modification of thebases. Formula I is adenine (A). Formula II is uracil (U). Formula IIIis guanine (G). Formula IV is cytosine. Thymine is identical to uracilexpect that there is a methyl group at position C5 rather than an H.

[0031] Because the base-pairing takes place in a organic solvent, themodified bases must be soluble in an organic solvent. Modifications of Aat 2- and 8-positions, G at 8-position, and U, C at 5- and 6-positionswill increase solubility in organic solvent. Halogenation and alkylationare useful both for improving the selectivity of pair bonding andincreasing solubility in organic solvents. Modification with ahydrophobic alkyl group at a position where base-pair bonding is nothindered is desirable for increasing organic solvent solubility.Examples of the modifying group include ethyl groups, propyl groups, andcyclohexyl groups. Halogen and alkyl groups contribute to bothimprovement of dissolubility into organic solvents and improvement ofthe selectivity of pair bonding formation. Improvement of solubility inorganic solvents can also be accomplished by introducing an alkyl groupor the like to nitrogen atom at 7- or 9-position of A or G and tonitrogen atom at 1-position of U or C. In addition, a heavy elementcomplex that is used for labeling may promote solubility into an organicsolvent depending on the design. To achieve higher solubility in anorganic solvent a base can be partially substituted by at least onesubstituent group selected from the group consisting of alkyl groups,cyclohexyl groups, halogen groups, phenyl groups, and phenol groups. Inthe case of adenine (A), the substituent group is located at the 2-and/or 8-positions. In the case of guanine (G), the substituent group islocated that the 8-position. In cases of uracil (U) and cytosine (C),the substituent group is located at the 5- and/or 6-positions. Where twosubstituent groups are used, they can be identical, similar ordissimilar in kind.

[0032] Formula V depicts an U-A Watson-Crick type base-pair. Formula VIdepicts an C-G Watson-Crick type base-pair.

[0033] Modified bases can be used to increase the specificity ofbase-pairing, e.g., to reduce the formation of G-G, G-A, or G-Ubase-pairs relative to the appropriate G-C base-pair. Modification canalso decrease the occurrence of non-Watson-Crick base-pairing, i.e., itcan reduce the formation of Hoogstein type base-pairs. To increasespecificity, A, U, G, and C can be are modified with halogens (e.g., Cl,Br and I), methyl groups, ethyl groups, alkyl groups (such as cyclohexylgroups), amino groups, or the like. Modification sites can be 2- and8-positions for A, 8-position for G, and 5- and 6-positions for U and C.For example, halogenation of A and G at 8-position, animation of A at2-position, and halogenation of U and C at 5-position intensify pairbonding and enhance the selectivity.

[0034] The substituants at the 2-position and 8-position in A, the8-position in G, and the 2-position and 5-position in U (T) and Cinclude: halo, substituted or unsubstituted C₁-C₁₂ alkyl, substituted orunsubstituted C₃-C₁₀ cycloalkyl, substituted or unsubstituted C₂-C₁₂alkenyl, substituted or unsubstituted C₅-C₁₂ cycloalkenyl substituted orunsubstituted C₂-C₁₂ alkynyl, amino, and —NH(C₁-C₆ alkyl).

[0035] As used herein, the term “halo” or “halogen” refers to anyradical of fluorine, chlorine, bromine or iodine.

[0036] The term “alkyl” refers to a hydrocarbon chain that may be astraight chain or branched chain, containing the indicated number ofcarbon atoms. For example, C₁-C₁₂ alkyl indicates that the group mayhave from 1 to 12 (inclusive) carbon atoms in it. The term “haloalkyl”refers to an alkyl in which one or more hydrogen atoms are replaced byhalo, and includes alkyl moieties in which all hydrogens have beenreplaced by halo (e.g., perfluoroalkyl).

[0037] The term “cycloalkyl” as employed herein includes saturatedcyclic, bicyclic, tricyclic, or polycyclic hydrocarbon groups having 3to 12 carbons, wherein any ring atom capable of substitution can besubstituted by a substituent. Examples of cycloalkyl moieties include,but are not limited to, cyclopentyl, norbornyl, and adamantyl.

[0038] The term “cycloalkenyl” as employed herein includes partiallyunsaturated, nonaromatic, cyclic, bicyclic, tricyclic,or polycyclichydrocarbon groups having 5 to 12 carbons, preferably 5 to 8 carbons,wherein any ring atom capable of substitution can be substituted by asubstituent. Examples of cycloalkyl moieties include, but are notlimited to cyclohexenyl, cyclohexadienyl, or norbornenyl.

[0039] The term “substituents” refers to a group “substituted” on analkyl, cycloalkyl, alkenyl, alkynyl, heterocyclyl, heterocycloalkenyl,cycloalkenyl, aryl, or heteroaryl group at any atom of that group.Suitable substituents include, without limitation, alkyl, alkenyl,alkynyl, alkoxy, acyloxy, halo, hydroxy, cyano, nitro, amino, SO₃H,sulfate, phosphate, perfluoroalkyl, perfluoroalkoxy, methylenedioxy,ethylenedioxy, carboxyl, oxo, thioxo, imino (alkyl, aryl, aralkyl),S(O)_(n)alkyl (where n is 0-2), S(O)_(n) aryl (where n is 0-2), S(O)_(n)heteroaryl (where n is 0-2), S(O)_(n) heterocyclyl (where n is 0-2),amine (mono-, di-, alkyl, cycloalkyl, aralkyl, heteroaralkyl, andcombinations thereof), ester (alkyl, aralkyl, heteroaralkyl), amide(mono-, di-, alkyl, aralkyl, heteroaralkyl, and combinations thereof),sulfonamide (mono-, di-, alkyl, aralkyl, heteroaralkyl, and combinationsthereof), unsubstituted aryl, unsubstituted heteroaryl, unsubstitutedheterocyclyl, and unsubstituted cycloalkyl. In one aspect, thesubstituents on a group are independently any one single, or any subsetof the aforementioned substituents.

[0040] Useful modified bases include: 8-bromopurine, 2,6-diaminopurine,5-bromouracil, 5-iodouracil, and 5-alkyluracil. Introducing a methylgroup into 5-position of uracil to enhance the selectivity of base-pairformation gives rise to nothing other than thymine. Accordingly, thyminederivatives are also embraced in the uracil derivatives in accordancewith the present invention.

[0041] Base-Pairing Conditions

[0042] The heavy-element labeled bases and the single-strand nucleicacid molecules are brought into contact in the presence of an organicsolvent, preferably an organic solvent that lacks hydrogen bond donorsand acceptors. Polar solvents that are ionic liquids and have arelatively high dielectric constant and lack hydrogen bond donors oracceptors will readily solubilize heavy-element labeled bases and willnot interfere with base-specific binding. Thus, ionic liquids such asorganic salts made of combination of cations (imidazoliums,pyrrolidinums, pyridinums, phosphoniums) and anions (borates, sulfates,sulfonates, amides, imides, halogenides) are useful solvents.

[0043] Also useful are organic solvents having a dielectric constantbelow 10. Such solvents include: chloroform, heptane, cyclohexane,carbon tetrachloride, acetonitrile, aniline, ethyl amine, cresol, aceticacid, trichloroacetic acid, dimethyl ether, diethyl ether, toluene,toluidine, benzylamine, phenol, decanol, benzene, quinoline, morpholine,dimethyl amine, chlorobenzene, dichloromethane, dichloroethylene,tetrahydrofuran, trichloroethylene, dichlorobenzene, fluorobenzene,bromobenzene, pentanol, siloxane, and glyceride. One or more heavyelement-labeled bases are dissolved in a solvent or solvent mixture thatincludes at least one organic solvent and then applied to the nucleicacid molecule attached to the support film. The non-polar orlow-polarity organic solvent facilitates specific base-pairing. However,polar solvents, e.g., chloroform, toluene, aniline, and pentanol, areuseful for solubilizing the base-specific labels. In general, anysuitable relatively non-hydrogen bonding solvent can be used. Thepreferred solvent or solvent mixture in a given situation will depend onparticular base-specific labels used in a given labeling reaction.

[0044] Heavy Element Labels

[0045] Each of the bases or base derivatives is preferably labeled witha heavy element complex that is resistant to organic solvents andresistant to mass loss events caused due by electron bombardment. For Aand G (or A and G derivatives) the heavy element complex is preferablybonded, e.g., covalently, to N7 or N9. For U (or T) and C (or U or T andC derivatives), the heavy element complex is preferably bonded, e.g.,covalently, to N1. Of course, it is also possible to link a heavyelement containing group at both the N7 and N9 positions of A or Grather than either the N7 position or the N9 position. In this case, thenumber of heavy element atoms is increased. The resolution and contrastof the TEM image are enhanced thereby allowing for improved basediscrimination. Methods for synthesizing nucleotides with heavy metallabels are well-kwon in the art (see, e.g., Moudrianakis and Beer, 1965Proc. Natl. Acad. Sci. USA 53:564; Beer et al. 1978/1979 Chemica Scripta14:263; and Commerford 1993 Biochemistry 10:1993).

[0046] Suitable heavy element complexes include amine complexes, benzenecomplexes, metallocene complexes, olefin complexes, and many othercomplexes. When a heavy element complex is bonded to a base, it can be aheavy element complex obtained by coordinating a metal element directlyto nitrogen within a base. Furthermore, a heavy element complex may bebonded to nitrogen within a base via a so-called linker or adapter, suchas a polymethylene chain or polyoxyalkylene chain.

[0047] The heavy element complex preferably includes a metal that has anelectron scattering intensity sufficient to be detected by techniquessuch as TEM. Thus, the heavy element complex preferably includes atleast one metal atom with an atomic number greater than 25, preferablygreater than 30. Suitable metal atoms includes: Pt, Eu, Pd, Co, U, Os,Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn. Since it is important that eachof the four different labeled bases be distinguishable by electronmicroscopy or the like, the metal atom used in each of the fourdifferent labeled bases preferably differ in atomic number by at least10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25 or more. Any combinationof the metal atoms differing sufficiently atomic number can be used forthe four different labeled bases. Suitable combinations include: Pt, Eu,Pd and Co; U, Os, Pd and Fe; Hg, Gd, Cd and Zn; and Ac, W, Mo and Mn.Any metal can be used in combination with any bases so long as the fourmetals differ sufficiently in atomic number.

[0048] The heavy element complex used in the present invention is notlimited to heavy element complexes where each molecule includes onemetal atom. A complex including plural metal atoms can also be used. Theuse of these complexes is more desirable in terms of discriminationbetween elements or atoms and discrimination of signal from noise. Oneexample of such complexes including plural metal atoms is adisubstituted heavy element complex in which one molecule contains twometal atoms. Another example is a trisubstituted complex containingthree metal atoms. A further example is an iron-sulfur cluster-metalcomplex including four metal atoms. The atoms included in one complexmay be similar or dissimilar metal elements.

[0049] For the discrimination of bases based on the total atomic number,the partial substitution of bases by heavy halogen atoms, such as iodideand bromide, can be used. For example, the following labels are readilydetectable: B₁₀I₉COOH, C₂B₁₀I₁₀, C₂B₁₀Br₁₀, C₂B₁₀Cl₁₀, C₂B₁₀F₁₀.

[0050] It should be noted that if a base with high frequency is combinedwith a lighter element or if a base with low frequency is combined witha heavier element, mutual interference is reduced. Examples of thesecombinations include: adenine-Pd, guanine-Pt, cytosine-Eu, uracil-Co andadenine-Zn, guanine-Hg, cytosine-Gd, uracil-Cd.

[0051] Formula VII depicts a Pd labeled A base-paired with a T in thesingle-stranded nucleic acid molecule.

[0052] Formula VIII depicts an Eu labeled C based-paired with a G in thesingle stranded nucleic acid molecule.

[0053] Formula IX depicts Depicted below is a Pt labeled G based-pairedwith a C in the single stranded nucleic acid molecule.

[0054] Formula X depicts Depicted below is a Co labeled U based-pairedwith an A in the single stranded nucleic acid molecule.

[0055] Heavy element clusters can be used for base-specific labeling.Using such clusters increases image contrast thereby increasing thesensitivity and reliability of base discrimination. Since the clustersare quite large, lower resolution imaging methods can be used. Forexample, a low voltage (e.g., 100 kV) phase-contrast electron microscopecan provide adequate resolution. Suitable heavy element clusters includenaturally occurring iron-sulfur clusters and their metal substitutes.For example, in a cubic iron-sulfur cluster (cubane) iron and sulfuratoms are alternately coordinated at the eight vertices of a cubecreating a cluster having four heavy atoms.

[0056] Many heavy element clusters are unstable in water. However, theydo exist in complexed to proteins, e.g., in ferredoxin and hydrogenase.Heavy element clusters are generally stable in organic solvents. Thus,bases labeled with heavy element clusters are stable if exposed only toorganic solvents. The clusters can be attached to bases directly or viaa linker or adapter. The use of heavy element clusters provides manyoptions for creating base-specific labels since the relevant atomicweight difference between any two labels is the total atomic weight ofthe heavy elements in each cluster. For example the relevant atomicweight of a cluster having two Fe atoms (atomic number 26) and two Moatoms (atomic number 42) is 136 (2×26+2×42).

[0057] Formula XI depicts a cysteine-complexed heavy metal cluster.

[0058] The stability of G-C base-pairs between unsubstituted C and Gsubstituted at N9 might be improved by substitution of C8 or the C2amino group of G with an electron withdrawing group, e.g., NO₂, O (oxo),F, (particularly at or COH (formyl)

[0059] Imaging

[0060] An electron microscope that can achieve sufficient contrast andresolution to perform quantitative elemental analysis of one atom of aheavy element with an atomic number greater than about 25 can be used toimage the labeled nucleic acid molecule. The system should be capable ofquantitative detection of two elements that differ in atomic number by35 or less, preferably 30, 25, 20, 15 or less.

[0061] Commercially available TEM and enhanced versions of TEM areuseful for imaging the labeled nucleic acid molecules. Phase contrastTEM is particularly useful in the methods of the invention. Phasecontrast TEM can be achieved in several ways. For example, the objectivelens can be defocused, allowing only a portion of the spatial frequencycomponents in the exit wave to be expressed in the image (Scherzer(1949) J. Appl. Phys. 20:20). In this approach the resulting loss of lowspatial frequency components (e.g., image alignment, low frequencyobject information, and particle finding) is compensated for by taking aseries of defocused images and numerical reconstructing the object phase(Kirkland 1984 Ultramicroscopy 15:151 and Coene et al. 1992 Phys. Rev.Lett. 69:3743). Alternatively, energy filtering can be used to removethe inelastic contributions to the images thereby achieving highercontrast and a better signal-to-noise ratio for the low frequencycomponents (Schroder et al. 1990 J. Struct. Biol. 105:28).

[0062] More recently phase contrast TEM has been achieved using aZernike phase plate positioned in the back-focal plane of the objectivelens. The phase plate is a thin material film having a center hole.Unscattered electrons pas through the hole, while scattered electronspass through the thin films are phase retarded. A suitable phase platecan be prepared by vacuum evaporation of carbon on to freshly cleavedmica. The film is floated on water and transferred onto a molybdenumobjective lens aperture. The opening in the center of the phase platecan be produced using a ion beam. For a −ì/2 phase shift, a 31 nm thickcarbon film can be used for the acceleration voltage of 300 kV and a 24nm carbon thick film can be used for an acceleration voltage of 100 kV(see Danev and Nagayama 2001 Ultramicroscopy 88:243-252 and JapanesePatent Application No. 2000-085493). To achieve a high resolution (0.2to 0.3 nm) permitting discrimination between heavy elements, the TEMpreferably has low spherical aberration and chromatic aberration therebyachieving a high resolution limit. A high-resolution and high-contrastTEM that permits insertion of a phase plate and is capable ofdiscriminating between individual atomic elements preferably uses avoltage of more than 300 kV. In the case of discrimination betweenelement clusters each made up of 3 to 5 atoms, lower voltage, forexample, a 100 kV TEM.

[0063] A complex electron microscope that combines a phase-contrast TEMand a standard TEM can be used to achieve resolution of 0.2 to 0.3 nm(Japanese Patent Publication No. 11-258057). For example, a complexelectron microscope image consisting of a real number component signaland an imaginary number component signal is obtained by detecting thereal number component TEM image of a specimen and the imaginary numbercomponent TEM image produced by phase-shifting only electron wavestransmitted through the specimen by ì/2 and taking the complex sum ofthe real number component TEM image and the imaginary number componentTEM image.

[0064] Independently of the above-described methods, improved contrastTEM can be achieved using a cryogenic specimen stage. The use of such astage allows one to increase the electron dose which increases thesignal to noise ratio.

[0065] Image Analysis

[0066] Images generated by TEM are recorded and processed as digitalinformation by a computer which can identify heavy elements by signalintensity. The recording medium used for image analysis is selected tobe appropriate for the magnified image produced by the TEM.

[0067] Assuming one base occupies an area of 0.7 nm×0.7 nm within theTEM image and that an allowance that is 10 times the occupied area isgiven to the TEM image for reliable discrimination between bases andthat 10⁵ bases are captured in a single image, then one frame of animage has an actual area of 0.7×0.7×10⁵ nm² or about 0.5 îm². If eachpixel is set to half the resolution (e.g., 0.15 nm), then the number ofrequired pixels is (0.7×10³ nm/0.15 nm)² or about 2×10⁷. Thus, digitalrecording media such as CCDs and imaging plates (IPs) can be used forimage capture. However, the size per pixel is preferably substantiallyequal to the pixel size of about 5 îm of high-resolution photographicfilm.

[0068] The present analysis system is characterized in that it cananalyze the sequence of single-stranded nucleic acid molecules even ifthe backbone is not completely elongated. For example, the sequence of anucleic acid molecule can be determined even when the backbone is bentor has portions which appear to touch each other or if the backbonecrosses over itself, forming an intersection. If the base spacingbetween chain portions close to an intersection is relatively large, thebases can be discriminated based on the atomic number dependenceintensity of the spot observed while tracing the backbone. Thus, thesequence can be determined.

[0069] If the intersection created by backbone crossing is such that thebackbone cannot be readily traced, two or more images of theintersection region are collected. The sample is tilted differently, forexample by about 30°, for each image. In this manner depth informationis obtained, and an image from which overlap has been removed isderived.

[0070] Arrays

[0071] The nucleic acid sequencing method is suited to very highthrough-put analysis. As discussed above, this is because the methoddoes not depend on the cloning of particular nucleic acid molecules.Indeed it is possible to create a whole genome array on a 1 mm² grid byarranging, for example, rows of parallel, elongated single strandednucleic acid molecules. Each row contain, for example 3000 nucleic acidmolecules, each about 10,000 bases long (or about 7 î long). Assuming areasonable spacing between rows and between molecules in a single row,about such 100 rows could be held on a 1 mm² grid. Thus, such a gridcould contain an entire genome or 3×10⁹ bases (100 rows×3000molecules/row×10,000 bases/molecule). Such an arrangement could make itpossible to analyze one genome equivalent of sequence per day (assuming3×10⁵ bases analyzed/image×10,000 images/day).

[0072] Arrays can be produced by adsorbing thiolated nucleic acidmolecules an array of gold dots. For example, gold dots having adiameter of less than 1 nm (e.g., 2, 5, 10, 50, 100 nm) can be regularlyaligned on a carbon film as a rectangular lattice with 10, 20, 50, 100,200, 300 400 or 500 nm spacing or any other desirable spacing.

[0073] A reusable addressable array can be constructed by binding eachof a plurality of short, specific oligonucleotides to a specificposition on a surface, e.g., to a specific gold dot in an array of suchdots. Each oligonucleotide would hybridize to a specific single-strandedsequence thus creating an array of nucleic acid molecules at knownlocations.

[0074] The references cited herein and are hereby incorporated byreference in their entirety.

[0075] A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention.Accordingly, other embodiments are within the scope of the followingclaims.

1. A method for determining the sequence of a plurality of bases in anucleic acid molecule, comprising: (a) providing a support surfacebearing a single stranded nucleic acid molecule comprising a pluralityof bases labeled with one of four different base-specific labels; (b)detecting the base-specifically labeled bases by electron microscopy;and (c) analyzing the detected base-specifically labeled bases todetermine the sequence of the nucleic acid molecule.
 2. The method ofclaim 1 wherein each base-specific label comprises at least one atomhaving an atomic number greater than
 25. 3. The method of claim 1wherein each bases is labeled with one of four base-specific labels,wherein each of said four base-specific labels comprises a differentelement having an atomic weight greater than 25 and each element havingan atomic weight greater than 25 differs in atomic weight from eachother element having an atomic weight greater than 25 by an atomicweight of at least
 15. 4. The method of claim 2 wherein the elements areselected from the group consisting of: Pt, Eu, Pd, Co, U, Os, Fe, Hg,Gd, Cd, Zn, Ac, W, Mo, and Mn.
 5. The method of claim 1 wherein the fourbase-specific labels consist of: (i) a label comprising a substitutedadenine, (ii) a label comprising a substituted uracil or a substitutedthymine, (iii) a label comprising a substituted cytosine, and (iv) alabel comprising a substituted guanine. (substitution doesn't alterWatson-crick base pairing)
 6. The method of claim 3 wherein the fourelements are: (i) Pt, Eu, Pd, and Co; (ii) U, Os, Pd and Fe; (iii) Hg,Gd, Cd and Zn; or (iv) Ac, W, Mo and Mn.
 7. The method of claim 5wherein N7 or N9 of the substituted adenine is substituted with a groupcomprising an element having an atomic weight greater than
 25. 8. Themethod of claim 5 wherein N7 or N9 of the substituted guanine issubstituted with a group comprising an element having an atomic weightgreater than
 25. 9. The method of claim 5 wherein N1 of the substitutedcytosine is substituted with a group comprising an element having anatomic weight greater than
 25. 10. The method of claim 5 wherein N1 ofthe substituted uracil is substituted with a group comprising an elementhaving an atomic weight greater than
 25. 11. The method of claim 5wherein N1 of the substituted thymnine is substituted with a groupcomprising an element having an atomic weight greater than
 25. 12. Themethod of claim 5 wherein the substituted adenine has a substitutionthat increases the specificity of Watson-Crick type base-pairing withuracil or thymine.
 13. The method of claim 5 wherein the substitutedthymine or substituted uracil has a substitution that increases thespecificity of Watson-Crick type base-pairing with adenine.
 14. Themethod of claim 5 wherein the substituted cytosine has a substitutionthat increases the specificity of Watson-Crick type base-pairing withguanine.
 15. The method of claim 5 wherein the substituted guanine has asubstitution that increases the specificity of Watson-Crick typebase-pairing with cytosine.
 16. The method of claim 12 wherein thesubstituted adenine is substituted at C2 or C8 or both C2 and C8. 17.The method of claim 13 wherein the substituted thymine or thesubstituted uracil has a substitution at C5 or C6 or both C5 and C6. 18.The method of claim 14 wherein the substituted cytosine or thesubstituted uracil has a substitution at C5 or C6 or both C5 and C6. 19.The method of claim 12 wherein the substituted guanine is substituted atC2.
 20. The method of any of claims 16-19 wherein the modification is asubstituted or unsubsituted alkyl group or a halogen.
 21. The method ofclaim 1 wherein the 3′ nucleotide or the 5′ nucleotide of the nucleicacid molecule is covalently bound to a defined region of the supportsurface.
 22. The method of claim 21 wherein the defined region of thesupport surface is coated with gold.
 23. The method of claim 1 whereinthe support surface bears a plurality of single-stranded nucleic acidmolecules wherein the 3′ nucleotide or the 5′ nucleotide of each nucleicacid molecule is covalently bound to a defined region of the supportsurface.
 24. The method of claim 23 wherein each of said plurality ofsingle-stranded nucleic acid molecules is covalently bound to adifferent defined region of the support surface.
 25. The method of claim2 wherein each base-specific label comprises at least 2 atoms having anatomic number greater than
 25. 26. The method of claim 25 wherein eachbase-specific label comprises a group selected from the group consistingof: B₁₀I₉COOH, C₂B₁₀I₁₀, C₂B₁₀Br₁₀, C₂B₁₀Cl₁₀, C₂B₁₀F₁₀.
 27. The methodof claim 1 wherein in the electron microscopy is transmission electronmicroscopy.
 28. The method of claim 27 wherein the transmission electronmicroscopy includes the use of a transmission electron microscopecomprising a Zernike phase plate located behind the objective lens. 29.The method of claim 1 wherein the electron microscopy comprises the useof a complex electron microscope.
 30. The method of claim 24 wherein theplurality of nucleic acid molecules form a regular array.
 31. A methodfor determining the sequence of a plurality of bases in a nucleic acidmolecule, comprising: (a) providing a support surface bearing a singlestranded nucleic acid molecule comprising a plurality of bases; (b)contacting the single stranded nucleic acid molecule with four differentbase-specific labels in the presence of an organic solvent that is freeof hydrogen bond donors and acceptors to allow non-covalent binding ofthe four different base-specific labels to the plurality of bases; (c)detecting the base-specifically labeled bases by electron microscopy;and (d) analyzing the detected base-specifically labeled bases todetermine the sequence of the nucleic acid molecule.
 32. A substitutedadenine wherein the C2 or C8 position is substituted with a C₁-C₅ alkylgroup or a halogen and the N7 or N9 position is substituted with a groupcomprising an element having an atomic weight greater than
 25. 33. Asubstituted guanine wherein the C8 position is substituted with a C₁-C₅alkyl group or a halogen and the N7 or N9 position is substituted with agroup comprising an element having an atomic weight greater than
 25. 34.A substituted uracil wherein the C5 or C6 position is substituted with aC₁-C₅ alkyl group or a halogen and the N1 position is substituted with agroup comprising an element having an atomic weight greater than
 25. 35.A substituted cytosine wherein the C5 or C6 position is substituted witha C₁-C₅ alkyl group or a halogen and the N1 position is substituted witha group comprising an element having an atomic weight greater than 25.36. The substituted adenine of claim 32 wherein the element having anatomic weight greater than 25 is selected from the group consisting of:Pt, Eu, Pd, Co, U, Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn.
 37. Thesubstituted guanine of claim 33 wherein the element having an atomicweight greater than 25 is selected from the group consisting of: Pt, Eu,Pd, Co, U, Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn.
 38. Thesubstituted uracil of claim 34 wherein the element having an atomicweight greater than 25 is selected from the group consisting of: Pt, Eu,Pd, Co, U, Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn.
 39. Thesubstituted cytosine of claim 35 wherein the element having an atomicweight greater than 25 is selected from the group consisting of: Pt, Eu,Pd, Co, U, Os, Fe, Hg, Gd, Cd, Zn, Ac, W, Mo, and Mn.